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0 Microprocessor capable of decoding two instructions in parallel. 



< 

m 



0 An instruction fetch unit lU in a microprocessor 
capable of decoding two instructions in parallel 
fetches first and second instructions of the shortest 
instructions in one cycle. The fetched first instruction 
is supplied to and decoded by a first instruction 
decoder IDO, while the fetched second instruction is 
supplied to and decoded by a second instruction 
decoder IDl. In a case where an instruction having a 
bit width longer than the shortest instruction has 
been fetched by the instruction fetch unit lU, in- 



formation to be, decoded by the second instruction 
decoder ID1 is the non-head code of the instruction, 
and hence, a pipeline control unit PCNT invalidates 
the decoded result of the second instruction decoder 
ID1. Thus, it is permitted to decode the two shortest 
instructions in parallel, and to eliminate the erro- 
neous information of the decoded result of the sec- 
ond decoder in the case of the fetch of the non- 
shortest instruction. 



LU 



Rank Xerox <UK) Business Services 



EP 0 467 152 A2 
FIO. 1 



MICROPROCESSOR 



- PFQ - 



I ^??L . A 

I ou 



IDO 



33k 



^idO_out 



101 [ 



EG 



out 



PCNT 

"T" 



i I i * i 

CONTROL SIGItALS FOR 
iu, du, cu. ioo 

EU 1 32 




dO 
d1 
d2 j 
d3 I 





RO 


R1 


1 




RIS 





— ' — I 



lOU 



ADDRESS 



DATA 



2 



1 



EP 0 467 152 A2 



2 



BACKGROUND OF THE INVENTION: 

Field of the Invention: 

The present invention relates to a microproces- 
sor which is capable of processing a variable- 
length instruction set and which decodes a plurality 
of instructions in parallel. 

Description of the Prior Art: 

In any prior-art microprocessor capable of pro- 
cessing a variable-length instruction set, the par- 
allel decode of instructions is not performed. 

As a known example pertinent to the present 
invention, there is mentioned an instruction decod- 
ing method stated in a treatise "32-bit micropro- 
cessor V80 wherein the disturbance of a pipeline is 
suppressed by building in a cache and a branch 
prediction mechansim, etc., thereby to enhance a 
performance" contained on pp. 195 - 206 in NIK- 
KEI ELECTRONICS BOOKS "New-Generation 
Microprocessors RISC, CISC, TRON" published on 
September 11, 1989. 

In the known microprocessor, a plurality of 
instructions are not really decoded in parallel, but 
an Instruction is decoded in two stages, thereby to 
enhance the throughput of decode capability. The 
first-stage decode circuit of this known micropro- 
cessor is called a pre-decode unit, which has the 
function of decomposing a variable-length instruc- 
tion into elements of fixed length. The instruction 
decomposed into the fixed-length elements in this 
manner is once stored in a buffer (register) within 
the pre-decode unit, and It is transferred from the 
pre-decode unit to an instruction decode unit in 
compliance with the request of the instruction de- 
code unit. 

Meanwhile, the official gazette of Japanese 
Patent Application Laid-open No. 244233/1988 dis- 
closes a microprocessor which is Intended to short- 
en the decode time period of a variable-length 
machine language by decoding a plurality of unit 
instructions in parallel. With the microprocessor, 
the machine language instructions of 2 bytes are 
accepted from outside each time, and the unit 
instruction of the first byte and that of the second 
byte are respectively decoded by a first decoder 
and a second decoder. A first selector selects one 
decoded result from among a plurality of decoded 
results delivered from the first decoder. A second 
selector selects one decoded result from among a 
plurality of decoded results delivered from the sec- 
ond decoder, in accordance with the decode in- 
formation delivered from the first selector. The se- 
lect operation of the first selector is determined in 
accordance with the decode information delivered 
from the second selector. According to the micro- 



processor thus constructed, the machine language 
instructions of 2 bytes can be decoded in one 
machine cycle, and the decode time period of the 
variable-length instruction can be shortened. 

6 

SUMMARY OF THE INVENTION: 

The inventors' study, however, has revealed 
that, with the prior-art technique disclosed in the 
10 aforementioned treatise, two problems are involved 
in case of raising the speed of the processing of 
the microprocessor still more. 

The first problem is that, since the instruction 
decode uses the two stages in the pipeline, branch 
75 processing slows down to the corresponding ex- 
tent. 

That is, in a case where the branch processing 
is started and is followed by fetching and pre- 
decoding a branch destination instruction, a time 
20 period expended on the branch increases to the 
amount of one stage more than In a microproces- 
sor which requires only one stage for decode pro- 
cessing. 

As the second problem, in a case where the 

25 method of decoding an instruction In two stages as 
in the prior-art technique is adopted in a micropro- 
cessor which executes a plurality of instructions In 
parallel, the pre-decode unit governs the perfor- 
mance of the whole microprocessor. The reason is 

30 that, since the instruction to be processed is In the 
state of a variable length, the succeeding instruc- 
tion cannot be pre-decoded unless the pre-decode 
unit pre-decode the preceding Instruction. That Is, 
the pre-decode unit can pre-decode only one in- 

36 struction at a time. 

It has also been revealed by the inventors' 
study that there are three solving methods for the 
second problem. 

The first solution is that a plurality of pre- 

40 decode circuits for pre-decoding a plurality of 
instructions are connected in series. Herein, the 
succeeding pre-decode circuit refers to the output 
of the preceding pre-decode circuit. Moreover, the 
plurality of pre-decode circuits are designed so as 

45 to be operable within one cycle. Then, the problem 
can be solved. In this case, however, the delay 
time of the pre-decode circuits connected in series 
becomes problematic. 

The second solution is the method in which the 

50 pre-decode unit is endowed with a performance 
capable of pre-decoding one instruction In one 
cycle, whereupon the difference between the pro- 
cessing performances of the pre-decoder and the 
instruction decoder is absorbed by a buffer ar- 

55 ranged between them. Since, however, the maxi- 
mum throughput becomes one instruction/cycle 
with this method, the performance of the micropro- 
cessor is not considerably enhanced in spite of the 
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fact that the microprocessor is specially permitted 
to execute the plurality of instructions at the other 
stage. 

The third solution is that, as in the present 
invention, the succeeding instruction is decoded by 
placing any assumption on the format of the pre- 
ceding instruction. 

The present invention has been made in prac- 
ticably realizing the third solution, and has for its 
object to provide a microprocessor which can de- 
code a plurality of instructions at high speed and in 
parallel in case of processing a variable-length in- 
struction set. 

Meanwhile, regarding the prior-art technique 
disclosed in the aforementioned official gazette of 
Japanese Patent Application Laid-open, the inven- 
tors' study has revealed the following disadvantage: 
In order to decode alt the patterns of the unit 
instructions, a large number of instructions need to 
be decoded in parallel in the first and second 
instruction decoders, and one decoded result need 
to be selected from among a large number of 
decoded results by the selectors. Therefore, the 
hardware quantities of the instruction decoders and 
the selectors become enormous. 

It is accordingly another object of the present 
invention to provide a microprocessor which can 
decode a plurality of instructions at high speed and 
in parallel while restraining the quantity of its hard- 
ware to the minimum. 

In order to accomplish the objects, according 
to the present invention, an instruction is decoded 
under the assumption of the instruction length 
thereof. 

Subsequently, when the assumption has been 
found correct by the decode of the instruction, the 
decoded result of a succeeding instruction is also 
judged correct. To the contrary, when the assump- 
tion has been found erroneous, the decoded result 
of the succeeding instruction is judged erroneous, 
and it is invalidated. 

Further, the assumptive instruction length 
should desirably be the length of the shortest in- 
struction format in an instruction set. The reason is 
that the instruction format which is the shortest in 
the variable-length instruction set corresponds to 
instructions of high frequence in use, so the as- 
sumption holds good at a high probability. 

Besides, in order to decode a plurality of 
instructions in parallel, an instruction prefetch unit 
transfers an instruction code whose length is at 
least double the shortest instruction format, to an 
instruction decode unit. 

In the instruction decode unit, the instruction 
code is input to individual instruction decoders 
every length of the shortest instruction format. 
Each of the instruction decoders is capable of 
decoding, at least, the instructions having the shor- 



test instruction format, and at least one of the 
instruction decoders is capable of decoding all the 
instructions of the instruction set. It is also possible 
to hold the outputs of the respective instruction 
5 decoders in output latches different from one an- 
other. 

A microprocessor according to a typical em- 
bodiment of the present invention is outlined as 
follows: 

10 The microprocessor is characterized by com- 

prising: 

(1) a fetch unit (lU) which fetches first and 
second instructions each having an instruction 
length of predetermined bit width (16 bits), from 
75 outside said microprocessor, and which delivers 
the first and second instructions to output lines 
in parallel, said output lines having a bit width 
(32 bits) that is at least double the predeter- 
mined width; 

20 (2) a first instruction decoder (IDO) whose input 
is supplied with the first instruction on said out- 
put lines of said fetch unit (lU); 

(3) a second instruction decoder (ID1) whose 
input is supplied with the second instruction on 

25 said output tines of said fetch unit (lU); 

(4) a control unit (PCNT) which is supplied with 
a decoded result (idO_out) of said first instruc- 
tion decoder and that (idl out) of said second 

instruction decoder; and 

30 (5) an instruction execution unit (EU) which re- 
sponds to an output from said control unit 
(PCNT); 

wherein under a condition under which the 
first instruction of the predetermined instruction 

35 length is delivered from said output lines having 
the bit width that is at least double the predeter- 
mined width, said control unit (PCNT) responds 
to information on fulfillment of the condition in 
the decoded result (idO out) of said first in- 

40 struction decoder (IDO) and validates the de- 
coded result (id1 out) of said second instruc- 
tion decoder (ID1). so that said instruction ex- 
ecution unit (EU) executes the first instruction 
and the second instruction in parallel in re- 

45 sponse to the decoded results (idO out, 

idl out) of said first and second instruction 

decoders transmitted as the output of said con- 
trol unit; 

whereas under another condition under 
50 which an instruction having an instruction length 
different from the predetermined bit width is 
delivered from said output lines of said fetch 
unit (lU), said control unit (PCNT) responds to 
information on fulfillment of the other condition 

55 in the decoded result (idO out) of said first 

decoder (IDO) and invalidates the decoded result 

(idl out) of said second decoder (IDi), so that 

said instruction execution unit (EU) executes the 
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first instruction in response to the decoded re- 
sult (idO_out) of said first instruction decoder 
(IDO) transmitted as the output of said control 
unit (PCNT). 
It is decided whether or not the instruction 
codes processed by the instruction decoders cor- 
respond to the instructions which can be decoded 
by the respective instruction decoders (that is, the 
instructions which have the shortest instruction for- 
mat). In a case where, as the result of the decision, 
any of the instruction decoders has decoded the 
instruction having any different instruction format, 
the decoded results of the instruction codes suc- 
ceeding the particular instruction are all invalidated. 
The invalidation is readily realized using a control 
circuit. To the contrary, in a case where, as the 
result of the decision, all the instruction decoders 
have decoded the instructions having the shortest 
instruction format, all the decoded results are valid. 
On this occasion, the throughput of the instruction 
decode is the maximum, and the instructions equal 
in number to the instruction decoders are pro- 
cessed in one cycle. 

Thus, the maximum throughput of the instruc- 
tion decode can be rendered two or more 
instructions/cycle though subject to the cases of 
the correct assumption, and the second problem 
stated before can be solved. Moreover, since the 
instruction length Is assumed, the variable-length 
instruction need not be decomposed into the fixed- 
length elements by the pre-decode circuit, and the 
first problem stated before can be solved. 

In addition, according to the present invention, 
the second instruction decoder executes significant 
decode concerning only the instruction head code 
of the instruction (in other words, one sort of de- 
code), and the insignificant decoded result of the 
second instruction decoder is invalidated under any 
other condition (in other words, in case of a non- 
head code). Therefore, the plurality of instructions 
can be decoded at high speed and in parallel while 
the hardware quantity of the second instruction 
decoder is restrained to the minimum. 

Unlike the pre-decoding method hitherto 
known, the instruction decoding method of the 
present invention decodes an instruction under an 
erroneous assumption in a certain case. In this 
case, the decoded result is invalidated as de- 
scribed above, and the throughput becomes one 
instruction/cycle. In this manner, the processing 
performance depends upon the instruction format 
more in the method of the present invention than in 
the pre-decoding method. This point can be coped 
with in such a way that the instructions which have 
the format fulfilling the assumption are used to the 
utmost in a program. 

Other objects and features of the present in- 
vention will become apparent from the ensuing 



description of embodiments taken in conjunction 
with the accompanying drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS: 

5 

Fig, 1 shows a block diagram of a microproces- 
sor which is an embodiment of the present 
invention; 

Fig. 2 shows the six sorts of instruction lengths 
10 of a variable-length instruction set which the 
microprocessor of the embodiment has; 
Fig. 3 shows an example of the row of instruc- 
tions in a memory as to the instruction set of the 
embodiment; 

15 Figs, 4(A) and 4(B) show the values of signal 
lines iO - i5 in the case where the microproces- 
sor shown in Fig. 1 executes the instruction row 
in Fig. 3, as to two certain points of time; 
Fig. 5 shows a detailed arrangement diagram of 

20 a control circuit PCNT which is one of the con- 
stituents of the microprocessor in Fig. 1; and 
Fig. 6(A) shows the changes of control signals 
which are generated by instruction decode in 
the case where the instruction row in Fig. 3 is 

25 executed by the microprocessor in Fig. 1, while 
Fig. 6(B) shows the changes of control signals in 
the case of employing an architecture in which 
the microprocessor in Fig. 1 includes only one 
instruction decoder. 

30 

DESCRIPTION OF THE PREFERRED EMBODI- 
MENTS: 

Fig. 1 is a block diagram of a microprocessor 
35 to which the present invention is applied. The 
present invention makes it possible to decode a 
plurality of instructions in parallel. Here will be 
described the internal architecture and operation of 
the microprocessor which decodes two instructions 
40 in parallel as the simplest aspect of the parallel 
decode of the plurality of instructions. 

Internal Architecture of Microprocessor 



45 First, the internal architecture of the micropro- 

cessor will be described with reference to Fig. 1. 
The microprocessor in Fig. 1 is basically con- 
structed of an interface unit lOU. an instruction 
prefetch unit lU, an instruction decode unit DU and 

50 an execution unit EU. These units are capable of 
parallel operations, and pipeline processing is per- 
formed under the control of the instruction decode 
unit DU. 

55 Interface Unit tOU 



The microprocessor is connected with external 
devices (for example, a main memory) through the 
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interface unit lOU. This interface unit lOU transfers 
both instructions and data between the micropro- 
cessor and the main memory. 

More specifically, an instruction fetched from 
the main memory is transferred from the interface 
unit lOU to the instruction prefetch unit lU through 
signal lines having a width of 64 bits. 

On the other hand, data computed by the ex- 
ecution unit EU is transferred from this execution 
unit EU to the interface unit lOU through signal 
lines in two sets each consisting of 32 bits, while 
data fetched from the main memory is transferred 
from the interface unit lOU to the instruction de- 
code unit DU. 

Instruction Prefetch Unit lU 



The instruction prefetch unit lU has a prefetch 
queue PFQ. The instructions transferred from the 
interface unit lOU are once latched in the prefetch 
queue PFQ and aligned in ie-bit unit, whereupon 
the aligned instructions are delivered to the instruc- 
tion decode unit DU. The prefetch queue PFQ is a 
queue of FIFO (First-ln First-Out). 

The instructions after the alignment are trans- 
ferred from the instruction prefetch unit lU to the 
instruction decode unit DU through six sets of 16- 
bit signal lines 10 - i5. Here, the signal line iO bears 
the head code of the instruction to be decoded in 
the next machine cycle, and the signal tines i1 - i5 
bear the row of the instructions succeeding the 
instruction of the signal line iO. The signal line iO is 
connected to a first instruction decoder IDO. Simi- 
larly, the signal line i1 is connected to a second 
instruction decoder ID1. It is the feature of the 
embodiment of the present invention that the input 
of the second instruction decoder ID1 is uniquely 
determined by the signal of the signal line il and is 
not selected from among the signals of the signal 
lines il - i5. Besides, the first instruction decoder 
IDO has the function of decoding all instructions 
which can be processed by the microprocessor. In 
contrast, the second instruction decoder ID1 can 
decode only instructions in an instruction format 
having a length of 16 bits or 32 bits, among the 
instructions which the microprocessor can execute. 
The decoded results of the instructions in the first 
instruction decoder IDO and the second instruction 
decoder ID1 are respectively delivered to signal 
lines idO__out and id1_out and then sent to a 
pipeline control unit PCNT. 

Pipeline Control Unit PCNT 

The pipeline control unit PCNT generates con- 
trol signals for the units IQU. lU and EU on the 
basis of the signals of the signal lines idO_out and 
'dl__out and signals (not shown in the figure) in- 



dicating the statuses of these units lOU, lU and EU. 
Expansion Part Generator EG 

5 In addition, the instruction decode unit DU in- 

cludes an expansion part generator EG, by which 
immediate data or displacement data in the instruc- 
tions is expanded to 32 bits and then delivered. 
The position and length of the immediate data or 

w displacement data in any instruction are designated 
in the operation code of the instruction, and the 
data is obtained by decoding the operation code. 
The expansion part generator EG processes the 
data on the basis of the designation, and delivers 

15 the processed data to a bus dO or d1. The reason 
why the expansion part generator EG has two sets 
of 32-bit output lines, is that the data items are 
transferred independently under the respective 
controls of the first instruction decoder IDO and the 

20 second instruction decoder ID1 . 

Execution Unit EU 

Two integral arithnnetic logic units ALU are 
25 similarly disposed in the execution unit EU so as to 
correspond to the first instruction decoder IDO and 
the second instruction decoder IDl , respectively. 

Register File RF 

30 

A register file RF in the instruction decode unit 
DU is configured of sixteen 32-bit registers RO thru 
R15. Each of the registers has four read ports and 
two write ports, totaling six ports. Among these 

35 ports, one half (two read ports and one write port) 
corresponds to the side of the first instruction de- 
coder IDO and is connected to the first arithmetic 
logic unit ALUO. Likewise, the ports of the other 
half correspond to the side of the second instruc- 

40 tion decoder IDl and are connected to the second 
arithmetic logic unit ALU1. 

Signal Lines of 32-bit Width 



45 The instruction decode unit DU and the execu- 

tion unit EU are connected by six sets of signal 
lines dO, d1, d2, d3. eO and el each having a width 
of 32 bits. Among'them. the four sets (dO, d1, d2, 
d3) are used for transferring data from the instruc- 

50 tion decode unit DU to the execution unit EU, while 
the remaining two sets (eO, e1) are used for trans- 
ferring data from the execution unit EU to the 
instruction decode unit DU. 

By way of example, let's consider a case 

55 where the first arithmetic logic unit ALUO processes 
the instruction of adding the values of the registers 
RO and R1 and then setting the sum in the register 
Rl. In this case, the values of the registers RO and 
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R1 are first read out from the register file RF and 
respectively delivered to the 32-blt signal lines dO 
and d1. At the next execution stage to the instruc- 
tion decode unit DU in the pipeline, that is, in the 
execution unit EU. the first arithmetic logic unit 
ALUO receives the values from the signal lines dO 
and d1 and adds them up. The result of the addi- 
tion is delivered to the signal lines eO. Further, at 
the stage of register store which Is the next pro- 
cessing, the processing proceeds in the instruction 
decode unit DU again, and the value on the signal 
lines eO is set in the register R1 within the register 
file RF. The above is an operation employing the 
side of the first instruction decoder IDC. In case of 
employing the side of the second instruction de- 
coder ID1. the signal lines d2, d3 and el and the 
second arithmetic logic unit ALU1 are used. More 
specifically, the values of the registers RO and R1 
are respectively delivered to the signal lines d2 and 
d3. and they are added by the second arithmetic 
logic unit ALU1 . Thereafter, the result of the addi- 
tion is transferred to the register R1 by the use of 
the signal lines el. 

In the case of transferring data between the 
microprocessor and the memory, signal lines in 
two sets each consisting of 32 bits as laid between 
the signal lines eO, el and the interface unit lOU 
are used. Since the operation of this part is not 
directly relevant to the present invention, it shall be 
omitted from description. 

The effect of the present invention is that the 
parallel decode of a plurality of instructions be- 
comes possible. In this embodiment, the micropro- 
cessor having the instruction set of variable-length 
instructions will be taken as an example. Therefore, 
what the variable-length instruction is will be first 
explained briefly. 

Variable-length Instruction 

In short, the "variable-length instruction" 
means an instruction which has a plurality of In- 
struction formats and whose length changes when 
the different instruction formats are taken. In other 
words, an Instruction set including any instruction 
of different length has the instruction of variable 
length. 

Fixed-length Instruction 

In contrast, a case where all instructions have a 
fixed length is generally called an "instruction set 
of fixed length". 

Instruction Set of This Embodiment 



As shown in Fig. 2, this embodiment assumes 
the set of instructions which have six sorts of 



lengths of 16 bits thru 96 bits in 16-bit unit. In the 
memory, the instructions are located bordering ev- 
ery 16 bits. That is, the 16-bit elements of the 
instructions are all located at addresses of even- 
5 numbered bytes. This situation is illustrated in Fig. 
3 

Next, the operation of the parallel decode of 
instructions In this embodinnent will be described. 
Fig. 3 shows one example of the row of 

70 Instructions in the memory. The individual instruc- 
tions are indicated as, for example, InstO and inst1 . 
The instruction whose length exceeds 16 bits is 
indicated as, for example, inst2__0 and inst2_1 by 
further affixing lower bars and numerals. That is, 

15 the Instruction longer than 16 bits is divided into a 
plurality of elements. It is also assumed that a code 
which must be subjected to decode processing in 
each instruction is limited to the head code of the 
instruction. In other words, it Is assumed that the 

20 non-head code of each instruction is immediate 
data or displacement data. In the case of the in- 
struction Inst2 by way of example, the first code 

inst2 0 needs to be decoded, but the succeeding 

code inst2_1 need not be decoded. 

25 Under the above prennises, Figs. 4(A) and 4(B) 

show the statuses of the 16-bit signal lines iO - i5 at 
two certain points of time, the signal lines constitut- 
ing the transfer bus from the instruction prefetch 
unit lU to the instruction decode unit DU. Fig. 4(A) 

30 illustrates the statuses in which the instruction row 
in Fig. 3 has already been accepted in the prefetch 
queue PFQ of the Instruction prefetch unit lU, and 
in which the first instruction InstO is about to be 
decoded. In the first half of the next machine cycle, 

35 the first instruction instO is decoded by the first 
instruction decoder IDO, and the succeeding in- 
struction Inst1 by the second instruction decoder 
ID1 . As the results of the decoding, It Is found that 
the two instructions instO and inst1 are both In the 

40 Instruction format having the shortest length. Then, 
a command Is Issued from the Instruction decode 
unit DU to the Instruction prefetch unit I U so as to 
advance the pointer of Instructions to the amount of 
32 bits. In consequence, after a further half ma- 

45 chine cycle, the signal lines 10 - i5 between the 
Instruction prefetch unit lU and the instruction de- 
code unit DU fail into the statuses shown In Fig. 4- 
(B) In which the two instructions instO and inst1 
have been taken away and in which the Instructions 

50 instS and Inst6 are added instead. On this occa- 
sion, the instruction code inst2_0 is decoded by 
the first decoder IDO, and the instruction code 
inst2__1 by the second decoder ID1. As the de- 
coded result of the instruction code Inst2 0 In the 

55 first decoder IDO, it is found that the instruction 
inst2 is not in the instruction format having the 
shortest length. 

In a case where the shortest instruction is input 
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to the first instruction decoder IDO, the head opera- 
tion code of the next instruction is input to the 
second instruction decoder ID1. The second in- 
struction decoder ID1 decodes the instruction, as- 
suming such input of the head operation code of 
the next instruction. Therefore, in a case where the 
instruction decoded in the first instruction decoder 
IDO is a non-shortest instruction, the instruction 
decode in the second instruction decoder ID1 is 
judged erroneous. The judged result of the error is 
reflected in the output idO out of the first instruc- 
tion decoder IDO, and invalidation processing is 
performed in the pipeline control unit PCNT in 
response to this judged result. As shown in Fig. 1 , 

the decoded results, namely, the outputs idO out 

and idl out are sent from the first and second 

instruction decoders IDO, ID1 to the pipeline control 
unit PCNT. The output idO out contains informa- 
tion indicating whether or not the instruction de- 
coded in the first decoder IDO is in the instruction 
format of the shortest length. On the other hand, 

the output idl out may well contain information 

indicating that the instruction which the second 
decoder ID1 cannot decode has been input. In this 
embodiment, however, it is supposed that such 

information is not contained in the output id1 out. 

The decoded result of the second instruction 
decoder ID1 must be invalidated in conformity with 

the information contained in the output idO out as 

indicates that the length of the instruction having 
been input to the first instruction decoder IDO is not 
the shortest or 16 bits. The processing for the 
invalidation is performed by the pipeline control 
unit PCNT as stated above. 

Detailed Block Diagram of Pipeline Control Unit 
PCNT ^ 



Fig. 5 shows a detailed block diagram of the 
pipeline control unit PCNT. 

The pipeline control unit PCNT is configured of 
a pipeline stage control unit Pipe CNTL, a selec- 
tor SEL and a no-operation command unit NOP, 
and it controls the pipeline operation of the whole 
microprocessor on the basis of the outputs 
idO_out, id1_out and the statuses of the respec- 
tive units (lU, DU. EU, lOU). The processing stages 
in the pipeline processing are controlled by the 
pipeline stage control unit Pipe_CNTL of the pipe- 
line control unit PCNT in Fig. 5. Besides, the invali- 
dation processing for the output information of the 
second instruction decoder IDl is performed on 
this side of the pipeline stage control unit 
Pipe_CNTL. 

More specifically, the output id1__out of the 
second instruction decoder IDl is invalidated as 
follows: This output idl out of the second instruc- 
tion decoder iD1 is supplied to one input of the 



selector SEL. In this embodiment, another Input of 
the selector SEL is supplied with a fixed value NOP 
through not especially restricted. The fixed value 
NOP has quite the same fields as those of the 
5 output idl out, and affords a non-execution com- 
mand instruction called "no operation". The value 
NOP may be either Identical to or different from' the 
decoded information of an "nop" instruction which 
is generally employed as the instruction for com- 

w manding no operation. Necessary is that the in- 
struction NOP commands no operation, and the 
size of data to be handled, for example, may be 
designated to any value. The selection of either of 
the value NOP and the output id1_out in the 

;5 selector SEL is done In accordance with the in- 
formation idl valid which is contained in the out- 
put idO out being the decoded result of the first 

instruction decoder IDO and which indicates wheth- 
er or not the full length of the instruction decoded 

20 by the first instruction decoder IDO is 16 bits. In a 
case where the instruction length is 16 bits, the 

output idl out is selected. To the contrary, in a 

case where the instruction length exceeds 16 bits, 
the value NOP is selected. In this way, pipeline 

25 control signals pcntO and pcnti are obtained. 

Let's suppose the execution of the instruction 
row in Fig. 3 again. The changes of the pipeline 
control signals pcntO and pcnti on this occasion 
are shown in Fig. 6(A). It should be noted that, 

30 unlike Fig. 3, Fig. 6(A) represents time in the verti- 
cal direction thereof. By way of example, when the 
statuses In Fig. 4(A) shift into the statuses in Fig. 4- 
(B), the instructions instO and insti are decoded. 
This situation is indicated at the uppermost line in 

36 Fig. 6(A). In the next machine cycle, the instruction 

codes inst2 0 and inst2 1 are decoded, and the 

decoded result of the instruction code Inst2 0 and 

the fixed value NOP are respectively delivered as 
the signals pcntO and pcnti. Thenceforth, the ex- 

40 ecution proceeds similarly, and the instructions 
instO thru inst6 are subjected to the decode pro- 
cessing in 4 machine cycles. 

Shown in Fig. 6(B) are the changes of the 
control signal pcntO in the case of the prior art 

45 where processing similar to the above is performed 
using only the first instruction decoder IDO. In this 
case of the prior art, 7 machine cycles are required 
for the decode processing of the instructions instO 
thru instG as illustrated in Fig. 6(B). 

50 Thus, in this embodiment, an instruction de- 

coding capability double higher is attained at the 
peak, and a capability equal to one attained with 
the single instruction decoder is attained even In 
the worst case. 

56 Now, the processing of the Instructions instO, 

Insti and Inst2 0, inst2 1 will be descnbed as to 

more practicable examples. As the examples, it is 
assumed that the instruction instO is the fixed- 
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length instruction of adding the values of the regis- 
ters RO and R1 and then setting the result in the 
register R1, that the instruction insti is the fixed- 
length instruction of adding the vaiues of the regis- 
ters R2 and R3 and then setting the result in the 
register R3, and that the instruction inst2 is the 
variable-length instruction of adding displacement 
data to the value of the register R4" to obtain an 
address and then fetching the data of the address 
from the main memory and setting it in the register 
R5. Here, the instruction code inst2_0 is the op- 
eration code, and the instruction code inst2_1 is 
the displacement data. 

First, the processing of the instructions instO 
and insti will be described. 

The two instructions instO and insti are deliv- 
ered to the 96-bit signal lines laid from the prefetch 
queue PFQ, in the manner shown in Fig. 4(A). 
Then, the instruction instO is decoded by the first 
instruction decoder IDO, while the instruction insti 
is decoded by the second instruction decoder ID1 . 

In this case, it is decided as the result of the 
decoding of the instruction instO that this instruction 
tnstO is the shortest instruction. The result of the 
decision is indicated by asserting the signal 
idl valid in the decoded result idO out. The out- 
puts idO out and id1 out are respectively deliv- 
ered as the control signals pcntO and pcnti through 
the pipeline control unit PCNT described before. 
Subsequently, an operation to be stated below is 
performed by the commands of these control sig- 
nals. 

The value of the register RO and that of the 
register R1 are respectively delivered to the signal 
lines dO and d1 in accordance with the command 
of the control signal pcntO. Simultaneously, the 
value of the register R2 and that of the register R3 
are respectively delivered to the signal lines d2 and 
d3 in accordance with the command of the control 
signal pcnti. Subsequently, the arithmetic logic unit 
ALUO adds the values of the signal lines dO and d1 
and delivers the sum to the signal lines eO, while 
the arithmetic logic unit ALU1 adds the values of 
the signal lines d2 and d3 and delivers the sum to 
the signal lines el. Further, at the succeeding 
stage of register store, the value of the signal lines 
eO is set in the register R1. and the value of the 
signal lines el in the register R3. 

Next, the processing operation of the instruc- 
tion inst2 will be described. 

The instruction inst2 is delivered to the 96-bit 
signal lines laid from the prefetch queue PFQ, in 
the manner shown in Fig. 4(B). Then, the instruc- 
tion code (nst2 0 is decoded in the first instruction 

decoder IDO, while the instruction code inst2_1 is 
deocded in the second instruction decoder IDl 
under the assumption that It is the head code of 
the next instruction. 



In this case, it is decided as the result of the 

decoding of the instruction code inst2 0 that the 

instruction inst2 is a non-shortest instruction. The 
result of the decision is indicated by negating the 

5 signal id1 valid in the decoded result idO out. 

The output idO out is delivered as the control 

signal pcntO through the pipeline control unit PCNT 
described before. Simultaneously, since the signal 
id1_valid is negated, the instruction NOP com- 

w manding no operation is selected by the selector 
SEL in the pipeline control unit PCNT and is deliv- 
ered as the control signal pcnti. Subsequently, an 
operation to be stated below is performed by the 
commands of these control signals. 

15 The value of the register R4 is delivered to the 

signal lines dO in accordance with the command of 
the control signal pcntO. Also, the displacement 
data inst2_1 of 16 bits is expanded into 32 bits by 
the expansion part generator EG, and the expand- 

20 ed data is delivered to the signal lines d1. 

Besides, since the command of the control 
signal pcnti. is the value NOP. any output is not 
especially delivered to the signal lines d2 and d3. 
Subsequently, the integral arithmetic logic unit 

25 ALUO adds the values of the signal lines dO and d1 
(for calculating the address) and delivers the sum 
to the signal lines eO. The command for the 
arithmetic logic unti ALU1 is also the value NOP, 
and any output is not especially delivered to the 

30 signal lines el. 

Further, at the succeeding stage, in accor- 
dance with the command of the control signal 
pcntO, that address of the main memory which is 
specified by the value of the signal lines eO is 

35 accessed to fetch an operand, and the fetched data 
is set in the register R5. Since the commands of 
the control signal pcnti for the interface unit lOU 
and the instruction decode unit DU (register store) 
are the value NOP, the main memory is not acces- 

40 sed. and any value is not transferred or set to or in 
any register from the signal lines el , either. 

According to this embodiment, the throughput 
of the processing of the whole microprocessor is 
enhanced, and CPI (the number of machine cycles 

45 required for executing one instruction) can be ren- 
dered less than one. 

Moreover, a plurality of instruction decoders 
may include only one instruction decoder capable 
of decoding ail the instruction formats. The remain- 
so ing instruction decoders may have merely the func- 
tion of decoding the shortest instruction format. 
Therefore, the decoding of a plurality of instruc- 
tions can be realized with a small quantity of hard- 
ware. This merit results also in reducing the quan- 

55 titles of processing for testing and diagnosing the 
microprocessor and in shortening the time periods 
of the processing. 

Besides, an instruction code to be input to the 
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plurality of instruction decoders are uniquely di- 
vided by the length of the shortest instruction for- 
mat, and the resulting elements are input to the 
respective instruction decoders. That is, the inputs 
of all the instruction decoders are selected with 
ease. This merit is useful for the realization of a 
high speed together with the suppression of the 
quantity of hardware. 

The embodiment of the present invention is 
also applicable to a microprocessor which has a 
fixed-length instruction set. More specifically, most 
of the plurality of instruction decoders are permit- 
ted to decode only instructions of high frequence in 
use, whereby the instruction decoders for process- 
ing the plurality of instructions in parallel, which 
have a small quantity of hardware and which op- 
erate at high speed, can be realized. 

In addition, irrespective of the fixed-length in- 
struction set and the variable-length instruction set, 
instructions which each instruction decoder is ca- 
pable of decoding can be determined in correspon- 
dence with a circuit which the instruction decoder 
controls. By way of example, an instruction de- 
coder for controlling an arithmetic logic unit is 
capable of decoding only arithmetic logic instruc- 
tions, and for any other instruction, it produces a 
result indicating that it has failed to decode the 
instruction. This measure brings forth the effect 
that the number of signal lines to be laid from the 
instruction decoder to the controlled circuit de- 
creases. 

The present invention makes it possible to de- 
code a plurality of fixed-length instructions in par- 
allel in a variable-length instruction set. As com- 
pared with the prior-art method, accordingly, the 
invention enhances the maximum throughput of an 
instruction decoding performance. 

Claims 

1. A microprocessor comprising: 

a fetch unit which fetches first and second 
instructions each having an instruction length 
of predetermined bit width, from outside said 
microprocessor, and which delivers the first 
and second instructions to output lines in par- 
allel, said output lines having a bit width that is 
at least double the predetermined width; 

a first instruction decoder whose input is 
supplied with the first instruction on said output 
lines of said fetch unit; 

a second instruction decoder whose input 
is supplied with the second instruction on said 
output lines of said fetch unit; 

a control unit which is supplied with a 
decoded result of said first instruction decoder 
and that of said second instruction decoder; 
and 



an instruction execution unit which re- 
sponds to an output from said control unit; 

wherein under a condition under which the 
first instruction of the predetermined instruction 
5 length is delivered from said output lines hav- 

ing the bit width that is at least double the 
predetermined width, said control unit re- 
sponds to information , on fulfillment of the con- 
dition in the decoded result of said first instruc- 
w tion decoder and validates the decoded result 

of said second Instruction decoder, so that said 
instruction execution unit executes the first in- 
struction and the second instruction in parallel 
in response to the decoded results of said first 
'5 and second instruction decoders transmitted 

as the output of said control unit; 

whereas under another condition under 
which an instruction having an instruction 
length different from the predetermined bit 
20 width is delivered from said output lines of said 

fetch unit, said control unit responds to in- 
formation on fulfillment of the other condition in 
the decoded result of said first decoder and 
invalidates the decoded result of said second 
26 decoder, so that said instruction execution unit 

executes the first instruction in response to the 
decoded result of said first instruction decoder 
transmitted as the output of said control unit. 

30 2. A microprocessor according to claim 1, 
wherein when said control unit invalidates the 
decoded result of said second decoder, said 
instruction execution unit determines an ad- 
dress of an operand In response to that bit 

35 information of said output lines of said fetch 

unit which corresponds to a bit position of the 
invalidated decoded result of said second de- 
coder. 

40 3. A microprocessor according to claim 1, 
wherein the predetermined bit width is the 
shortest instruction length. 

4. A microprocessor according to claim 1, 
45 wherein said control unit includes a selec- 

tor, one input and the other input of which are 
respectively supplied with the decoded result 
of said second instruction decoder and a non- 
execution command instruction, and a control 
50 Input of which is supplied with information in- 

dicating the fulfillment of the first-mentioned 
condition and the fulfillment of the other con- 
dition. 

wherein when the first-mentioned condition 
55 is fulfilled, the decoded result of said second 

instruction decoder is transmitted to an output 
of said selector, 

wherein when the other condition is fulfil- 
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led, the non-execution command instruction is 
transmitted to said output of said selector, and 
wherein the output signal of said selector 
is supplied to said instruction execution unit. 
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